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[57] ABSTRACT 

Modem-equipped computers which can initiate an audio 
channel using the modem data connection. The connection 
is initiated with a new protocol called the voice-over-data 
protocol. The new protocol does not require any additional 
modem hardware or telephone line features, and is not tied 
to any proprietary hardware/software compression or trans- 
mission schemes. The voice-over-data protocol negotiates 
an audio compression/decompression scheme and then sets 
up an audio channel over an existing data connection using 
a socket. Compressed audio data is then delivered to the 
remote computer where it is decompressed and output. The 
voice-over-data protocol significantly reduces the latency 
which disrupts normal speech patterns when voice data is 
sent over a data connection. This protocol also reduces the 
bandwidth required to send voice oyer a data connection. 
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METHOD AND SYSTEM FOR AUDIO 
COMPRESSION NEGOTIATION FOR 
MULTIPLE CHANNELS 

FIELD OF THE INVENTION 

This invention relates generally to the field of computer 
communications. More specifically, it relates to a method 
and system providing simultaneous transmission of voice/ 
audio and data with standard communication devices. 

BACKGROUND AND SUMMARY OF THE 
INVENTION 

In today's world, modems are commonly used to connect 
a computer at one location to a computer at another remote 
location. A majority of computer users use a modem to 
establish a data connection directly to another computer, or 
to a computer which allows access to a network of comput- 
ers (e.g., the Internet). The modem connection is typically 
made over a telephone line (e.g., a plain old telephone 
service (POTS) line) that is normally used for voice calls. 
The user modem data connection to another computer is 
typically a data connection that does not permit voice traffic. 
If a user wants to talk with anyone, the data connection must 
be dropped, and a voice connection established. When the 
voice conversation is finished, the voice connection is 
dropped, and the data connection must then be 
re-established, a tedious and time consuming process. 

It is desirable for many types of applications to allow a 
voice connection to co-exist on the same telephone line that 
is being used as a data connection between two modems. 
TTiis voice connection can be used for a number of purposes, 
such as permitting a user to get live help from a support 
organization after calling a support bulletin board, to order 
merchandise after viewing an electronic catalog, to play an 
interactive game with another computer user, etc. Since 
voice transmission can generate large bursts of voice 
information, compression/decompression techniques are 
typically required to speed the transmission of voice infor- 
mation. However, most packet networks with modem con- 
nections are simply not capable of transmitting effective 
voice communications in real-time (even with compression/ 
decompression) over a data connection that has been estab- 
lished between two computers due to high latency (i.e., time 
delays of 200-500 milliseconds) and limited bandwidth. 

It is also desirable for an application (e.g., a computer 
game, an electronic music store, etc.) to transmit high 
fidelity audio along with data. As the transmission speed of 
modems increases (e.g., from 9600 and 14,400 (14.4) to 
28,800 (28.8) bits-per-second), it is now possible to rou- 
tinely add high fidelity audio to applications. A voice chan- 
nel (or audio channel) could be used to transmit this high 
fidelity audio associated with the application. High fidelity 
audio data also requires conapressioii/decoiTipression tech- 
niques be used since the amount of high fidelity audio data 
sent can be quite large. However, as was described above for 
voice, real-time latency and bandwidth problems prevent 
most packet networks with modem connections from trans- 
mitting high fidelity audio over a data connection. 

There have been many attempts to permit voice/data 
and/or high fidelity audio/data to be transmitted on the same 
telephone line used to make a modem connection. One 
example of voice/data transmission is the Voice View™ 
modem by Radish Communications Systems, Inc. of 
Boulder. Colo., which uses software to permit alternating 
voice and data (AVD). At any one time, voice or data can be 
transmitted over the connection between the two modems. 



2 

However, voice communications are awkward for users 
since the voice channel stops when data is sent. An addi- 
tional problem is that both ends of the connection must have 
the special Voice View™ Modems. If Voice View™ 

s Modems aren't present on both ends, then the alternating 
voice and data connection is not possible. 

Another technique used to overcome the voice/data prob- 
lem is using simultaneous voice and data (SVD) modems 
developed by the Intel®, Rockwell®, Multitech®, and oth- 

10 ers. The SVD modems use special modem hardware to 
provide simultaneous voice and data over a telephone line at 
any instant of time. The simultaneous voice and data 
modems allow a single channel of voice (or audio) to 
co-exist with a data stream. However, multiple channels of 

15 voice are not supported. Moreover, this solution requires 
significant computational abilities in the modem hardware to 
compress/decompress, multiplex/demultiplex the voice data 
stream, as well as a protocol for mixing the data and 
voice/audio streams. The special modem hardware signifi- 

20 cantly increases the cost of the modem, and uses proprietary 
compression and protocol schemes which are incompatible 
with most other existing modem hardware. As a result, both 
ends of the connection must have the specially-equipped 
modems to permit simultaneous voice/audio and data traffic 

25 over a single telephone line. In addition, not all SVD 
modems are compatible with other SVD modems (e.g., a 
Multitech® SVD modem will not communicate with an 
Intel® SVD modem). 
Another variety of the simultaneous voice and data 

30 "modems" is an Integrated Services Digital Network (ISDN) 
device. ISDN devices provide simultaneous voice and data 
transmission, but are significantly more expensive than a 
standard modem. In addition, ISDN devices typically 
require a special telephone line (i.e., an ISDN line) to take 

35 full advantage of the ISDN modem features. To. use simul- 
taneous voice and data, a user needs an ISDN device, and an 
ISDN telephone line (which requires an additional monthly 
fee) instead of a normal telephone line. 

^ Half-duplex voice has also been used to provide voice 
traffic over a data connection on a broadcast computer 
network such as the Internet However, these half-duplex 
network products (e.g., such as the InternetPhone™) do not 
allow an immediate transition between speaking and listen- 

45 ing. This dramatically interrupts natural speech patterns. The 
variations in the time required to send data (including voice 
data) across a broadcast computer network such as the 
internet (e.g., over 1 second), make it virtually impossible to 
overcome latency during a voice connection. 

so In accordance with a preferred embodiment of the present 
invention, the simultaneous voice/audio and data problem 
using standard modems is overcome. A new protocol, called 
the "voice-over-data protocol", provides simultaneous, full- 
duplex voice and data over a standard modem data 

55 connection, using a single telephone line. The voice-over- 
data protocol does not require any new, special, or propri- 
etary modem hardware, and utilizes Sockets, a standard 
operating system communication component for the trans- 
port. 

60 Voice-over-data uses a combined single protocol to 
handle both voice/audio and data. The voice-over-data pro- 
tocol is designed to allow a variety of non-proprietary 
compression/decompression techniques to be used for 
simultaneous voice/audio and data transfer, and also pro- 

65 vides the capability for multiple voice/audio channels to be 
transmitted over a single telephone line. This new protocol 
dramatically improves the latency between the speaker and 
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the listener (i.e.. the time delays are reduced to 50-100 device (e,g., a speaker, etc.), or other device providing 

milliseconds), allowing for more natural speech patterns. output to the computer system 10. The input/output devices 

The foregoing and other features and advantages of the 20 can also include network connections, modems, or 

preferred embodiment of the present invention will be more other devices used for communications with other computer 

readily apparent from the following detailed description, 5 systems or devices. 

which proceeds with reference to the accompanying draw- As is familiar to those skilled in the art, the computer 

ings. system 10 further includes an operating system and at least 

one application program The operating system is a set of 

BRIEF DESCRIPTION OF THE DRAWINGS software which controls the computer system's operation 

i_t i j- c * * a * i0 and the allocation of resources. The application program is 

FIG. 1 is a block duvnun of a computer system used to ^ b &e ^ 

miplementaprefeired embodiment of the present invention. maWng use of computer reso urces made available through 

FIG. 2 is a block diagram illustration of the computer ^ opcratul g sys tem. Both arc resident in the illustrated 

equipment and connections which can be used for the mem0 ry system 16. In accordance with the practices of 

illustrated embodiment of the present invention. 15 pcrs0Ils s)^^ in the art of computer programming, the 

FIG. 3 is a block diagram showing the prior art ISO present invention is described below with reference to acts 

network reference model. and symbolic representations of operations that are per- 

FIG. 4A is a block diagram which illustrates stripped formed by computer system 10, unless indicated otherwise, 

down version of the OSI network model. Such acts and operations are sometimes referred to as being 

FIG. 4B is the preferred embodiment model for the 20 computer-executed, 

present invention. It will be appreciated that the acts and symbolically 

FIG. 5 is a block diagram which illustrates date flow from represented operations include the manipulatioD by the CPU 

a local audio input device to a remote audio output device. ™ of electrical signals representing data bits which causes 

. . *u a u - u- vm ♦ a resulting transformation or reduction of the electrical 

FIGS. 6Aand6B together is a flow chart which illustrates * representation, and the maintenance of data bits at 

the data path of compression format negotiation message. \ Q ^ 0Q$ - m mcmory systcm 16 t0 ±mby rccon . 

FIGS. 7A, 7B, and 7C together is a flow chart which figure or omerwise alter me computer system's operation, as 

illustrates the compression format negotiation sequence. wel] ^ othcr processing of signals. The memory locations 

FIGS. 8A, 8B and 8C together is a flow chart which where data bits are maintained are physical locations that 
illustrates how compressed audio information is sent from a 30 have particular electrical magnetic, optical or organic prop- 
local computer to a remote computer. erties corresponding to the data bits. 

DETAILED DESCRIPTION OF A PREFERRED As * ^w-n HG. 2. the illustrated of the 

EMBODIMENT present invention includes a pair of computers (34,40) each 

33 with an associated modem 36, coupled via over a commu- 

Ref erring to FIG. 1, an operating environment for the nications link 38 to a remote computer 40. The modems 36 

preferred embodiment of the present invention is a computer are standard high speed modems of the sort made by 

system 10 with a computer 12 that comprises at least one Hayes®, U.S. Robotics®, Motorola®, Multi-Tech™, 

high speed processing unit (CPU) 14, in conjunction with a Zoom™, Practical Peripherals™, etc, for use on a standard 

memory system 16, an input device 18, and an output device ^ (i.e., not specialized) telephone line. However, specialized 

20. TTiese elements are interconnected by a bus structure 22. modems or other communication devices (e.g., ISDN, etc.) 

The illustrated CPU 14 is of familiar design and includes and specialized telephone lines (e.g., ISDN, etc.) can also be 

an ALU 24 for performing computations, a collection of used. The modem may be an external modem 36, or an 

registers 26 for temporary storage of data and instructions, internal modem (not shown in FIG. 2) connected to either a 

and a control unit 28 for controlling operation of the system 45 serial or parallel port on the computer. The communications 

10. Any of a variety of processors, including those from links 38 are standard telephone lines. The user and remote 

Digital Equipment, Sun, MIPS, IBM, Motorola, NEC, Intel, computers each have audio devices 42 (e.g., a telephone, 

Cyrix. AMD, Nexgen and others are equally preferred for speaker, microphone, etc.) through which a user can send/ 

CPU 14. Although shown with one CPU 14, computer receive voice and/or audio information, 

system 10 may alternatively include multiple processing 50 The local computer 34 and the remote computer 40 have 

units. an operating system such as 4,x Berkeley UNIX™, Win- 

The memory system 16 includes main memory 30 and dows® 95, Windows NT™, etc. which supports sockets as 
secondary storage 32. Illustrated main memory 30 is high a mechanism for communications. The operating system 
speed random access memory (RAM) and read only permits a plurality of application programs to be run, and 
mcmory (ROM). Main memory 30 can include any addi- 55 also permits a local application program to communicate 
tional or alternative high speed memory device or memory with a remote application program through a layered soft- 
circuitry. Secondary storage 32 takes the form of long term ware communication component 44. 
storage, such as ROM, optical or magnetic disks, organic When communication is established between two 
mcmory or any other volatile or non-volatile mass storage computers, a layered communications hierarchy is often 
system. Those skilled in the art will recognize that memory $o used. One layered communications hierarchy commonly 
16 can comprise a variety and/or combination of alternative used is the ISO OSI reference model which is known in the 
components, art The OSI network reference model is a layered model for 

The input and output devices 18. 20 are also familiar. The network communications. The purpose of each layer is to 

input device 18 can comprise a keyboard, mouse, pointing offer certain services to the higher layers, while shielding 

device, sound device (e.g., a microphone, etc.). or any other 65 higher layers from the details how the services offered by 

device providing input to the computer system 10. The lower layers are actually implemented. Each layer performs 

output device 20 can comprise a display, a printer, a sound a well defined function. The layer boundaries are chosen to 



08/30/2004, EAST Version: 1.4.1 



5,742,773 

5 .6 

minimize the information flow across the layer interfaces. The presentation layer 92 has been modified to handle 

The functionality of each layer is generic enough to handle only one type of communications, namely socket comnui- 

a large number of diverse situations. nications. All functionality that does not directly relate to 

FIG. 3 shows an example of how data can be transmitted socket communications has been stripped out and discarded, 

using the OSI model known in the art. A sending process 48 5 This makes the presentation layer 92 small, compact, 

has some data 50 it wants to send to the receiving process 52. efficient, and very fast 

Jht sending process gives the data 50 to the application ^ additi not ^ iWe ^ es m mt± Qnl 

layer 54 on the sending sid^ which attaches to the data an ^ ^ ^ socket ^ raw socke ( 

application header 56 and then gives the resulting item to f « ^ * ' 

the presentation layer 58. The presentation layer processes, in . . " ' , f \ ,7™- f 

and may transformed 10 Vldes reliable (i.e., no acknowledgements (ACKs) or 

and passes the item to the session layer 62. This process is g^nteed dehvery), unsequenced, data transfer. The raw 

rerx S atedmthe S essionlayer62,whichadd S asessionheader so f* P™™^ ac <* ss to * e ^rlymg conununi- 

64; the transport layer 66, which adds a transport header 68; f. a *° ns P™ tocols (e.g., modem protocols) and is also unre- 

the network layer 70, which adds a network header 72; and M J??J? and unsequenced. The datagram and raw sockets are 

the data link layer 74, which adds a data link header 76. 15 ***** . than me stream socket whlch 18 normall y used fQr 

When the data with all the attached headers finally reaches «a^m^ons since the stream socket provides reliable, 

the physical layer 78, it is transported to the receiving (acknowledged and guaranteed) sequenced data transfer, 

computer as bits 80 over some physical medium, On the 111(5 ^ of a datagram (or raw) socket without acknowl- 

receiving computer, the various headers which were added , n edged data transfer helps eliminate a large portion of the 

on the sending side are systematically stripped off one by latency often encountered with prior art network communi- 

one at each corresponding layer on the receiving side until cations schemes. Since the socket does not have to wait for 

the original data item 50 reaches the receiving process 52. acknowledgements or data sequencing, the voice/audio data 

The entities in OSI layer N implement a service used by can bc transmitted at a faster rate, making the latency 

layer N+l (with physical layer being Layer 1). Services « penods significantly smaller than would be possible using an 

implemented by a layer are available at service access points acknowledged data transfer. The tradeoff for using unreli- 

(SAPs). The layer N SAPs are the places where layer N+l able **** transmission to improve latency is that a user may 

can access the services offered. In an operating system hcar a ^ blcd word hom timc t0 01 noisc 011 &c Unc 

which support sockets, one variety of a SAP is the socket, dunn S voice 01 other audio transmission. However, this is 

and the SAP address corresponds to the socket identifier. , 0 Preferable to the latency problems discussed above 

As can be seen from FIG. 3, if the full OSI model is which n0rmal P^erns. 
implemented far network communications, there is substan- when a user wisnes to establish an audio (e.g., voice or 
rial processing overhead for two computers to communicate, w * fidelit y audio ) channel connection using a modem 
even if the amount of data they send is small. As a practical which already has an established data connection, a voice- 
matter, as is shown in FIG. 4A, many operating systems 35 wer-data application is started on the local computer, 
combine the presentation and session layers 84, the transport This voice-over-data application creates a channel 
and network layers 86 and the data link and physical layers through a socket interface. As is shown in FIG. 5, when there 
86. Even in this reduced configuration, however, processing is voice activity at an originating local audio input device 
overhead in the combined presentation/session layer and in (t* e *> microphone, telephone, etc.), the audio device 
the network/data link layers can be very large. Data through- 40 produces low voltage audio signals. The audio signals are 
put is further slowed because these layers are typically typically fed through an analog-to-digital converter (ADC) 
generic enough to allow communication over a wide variety ( e *g-f on a sound board), which samples the audio signal 
of different transport mediums, necessarily entailing con- thousands of times per seconds and translates the analog 
necung to a wide variety of different types of computer au< tt° information into digital format (i.e., Vs and 0*s). The 
networks. 45 faster the sampling rate, and/or the larger the word size used 

Even using the reduced layered communications scheme t0 sam P le the analog signal, the better the quality of the 

shown in FIG. 4A, most attempts to implement simultaneous output sound. 

voice/audio and data over standard modem hardware have For example, an 8 KHz sampling rate is used for tele- 
failed since the processing time within the layers (especially phone quality voice transmission, 22 KHz is used for AM 
across broadcast networks using bridges and routers) makes 50 a^d radio sound quality, and 44.1 KHz is used for CD 
the latency time for speech and other audio so large that (high fidelity) sound quality. Most sound boards permit a 
normal speech patterns are impractical. In addition, using minimum of 11 KHz, 22 KHz, and 44.1 KHz sampling (so 
software corripressioi^decorDpression techniques within the 11 KHz sampling is typically used for voice transmission), 
layered communications scheme in FIG. 4 A has not A audio codec is a combination of an audio coder and 
improved the bandwidth problems since there is still sub- 53 decoder (ie., audio compressor/decompressor). In the pre- 
stantial processing overhead associated with executing soft- ferred embodiment of the present invention, the True 
ware designed to handle communications with a wide van- Speech™ audio codec by the DSP Group of Santa Clara, 
ety of transport mediums. Calif, is used. However, any audio codec known in the art 
In the preferred embodiment of the present invention, a (e-g., GSM 6,10 Audio Codec, CCTTT G.711 A-Law and 
new approach to the layered communications scheme was 60 u-Law codec, ADPCM codec, etc.) can be used in place of 
developed This new scheme addresses both the latency and the DSP Group codec. The DSP Group audio codec is 
bandwidth problems associated with voice/audio transmis- capable compressing and decompressing data using a num- 
sion. As shown in FIG. 4B„ the application layer 90 sits in ber of different compression/decompression formats typi- 
the new scheme on top of a new presentation layer 92. Hie cally achieving compression ratios of up to 8:1. 
new presentation layer 92 includes a socket layer 94 and a 65 Audio data needs to be compressed in the codec since just 
transport layer 96. The presentation layer 92 sits on top of a few seconds of audio input generates a huge amount of 
the physical layer 94. audio data, and as much audio information as possible must 
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be transmitted in a given bandwidth to avoid the latency The audio codec 108 then sends the compressed .WAV data 
problems discussed above. For example, for low fidelity to a socket (datagram or raw) 106, which is used to transport 
audio using an 11 KHz sampling rate with 16-bit sampling the audio data. Since the voice-over-data application 104 
resolution, a continuous 60 second sound sample produces sends the audio data to a socket 108, the underlying coin- 
about 132 mega-bytes (MB) of data. However, for high 5 munications technology (i.e., the modem driver and modem 
fidelity audio, using a 44.1 KHz sampling rate with 16-bit ™ this case) can be rcplacedwiA< ^ «we^ 
sanmlingresolution about 705,600bits of data are produced nologies (e g a networkTCP f ^m^es e c). 

a k t ci xra ~* Ae 0 The socket has an associated modem socket driver 110. 

every second, or about 53 MB of data per mto^a Before ^ ^ socte tet fa seat mc voiaKW . 

result, it is necessary to compress audio data to permit fast ^ appUcation ^ 04 adds a socket header , called me 

transmission. io » W ave-over-socket header," and sends the socket packet to 

Returning to FIG. 5, on receiving an audio signal, an ^ socket i<>$. y^tdch sends it to the socket driver 110, 

audio information driver 102 sends uncompressed audio which sends it to a packet compressor/decompressor 112. 

information to a local voice-over-data application 104. The packet compressor/decompressor 112 uses a lossless 

In the preferred embodiment of the present invention, a data compression scheme, such as the LZS variant of the 

WAVE audio information driveris used. However, any other 15 Lempcl Ziv technique, to compress the packet header infor- 

audio information drivers could be also used in place of the mation to create a socket compressed packet. However, 

WAVE audio information driver. The WAVE audio infor- other lossless compression schemes could also be used, 

mation driver outputs audio informationinthe.WAV format, A socket interface also has the capability of handling 

as is known to those in the art The .WAV format stores multiple input data streams. As a result, the socket 108 

digitally sampled waveforms from one or more channels and 20 provides a standard way to mix not only audio, video and 

^miitsav^etyofsampHngratesandbitresolutionsdepth^ ^ ** aud "> s ^*™ * w . eU ' ^ feature has a 

to be used. The .WAVformat can store compressed as well variety of unportent uses mcludxng giving location cues for 

1 : . V"|^Tr:^„ ttl Hot7m,„ u video conferencing applications (i.e., giving an auditory 

as uncompressed sampled audio signal da£ Whe i toe is Qf ^ * qr ^ 

activity at the local audio input device 100 a plurality ^of and ^ m ^ fM ^ mgt 

uncompressed -WAV format packets axe sent to the WAVE *J ^ & y . ^ conf ^ encing P appUcation ^ pro . 

driver by the audio hardware. ^ a 0Q a computer for cach party con . 

The WAVE driver 102 passes the .WAV information to the aected ^ ^ ^deo conference call for a user. The windows 

local voice-over-data application 104. After receiving this a uscr to sec ^ near q\x the other parties which are 

initial data, the voice-oYer-data application 104 establishes a ^ of the video conference call. Location cues are desirable 

datagram (or raw) socket connection through a socket 106 f or video conferencing since at any instance of time, one of 

(the details of which will be explained below) with a remote more Q f the parties to the video conference will be silent 

voice-over-data application on a remote computer and nego- When a silent party begins to speak, an immediate cue to the 

tiates a compression format location of the party who has started to speak is provided. 

For example, the local voice-over-data application may 35 The user can then immediately focus his/her attention to the 

send the compression format (specific, e.g., by sampling speaker who was previously silent Location cues help 

rate, bit resolution, etc.) to the remote voice-over-data provide more natural speech patterns during video confer- 

appucation. If a requested compression format is not avail- encing since a user does not have to spend time scan the 

able on a remote codec on the remote machine, the remote various video conference party windows to determine which 

voice-over-data application will reject the request ^ party has started to speak. Full spatial imaging uses two or 

The local voice-over-data application will continue to more audio channels to convey a wide spectrum of audio 

send compression format requests until the remote applica- source location information. 

tion accepts a compression format request, or the local The packet compressor 112 compresses uses the LZS 

application exhausts all of its known codec compression variant compression scheme to compress the packet header 

formats. If a compression format can't be negotiated, then 45 (e.g., 60 bytes) down to header size of one-tenth the original 

the socket connection is closed, and an error mes sage printed size (e.g., 6-8 bytes). However, the packet compressor could 

for the user. The details of the negotiation scheme will also also compress the packet header to a larger or smaller header 

be explained in more detail below. size. In one embodiment of the present invention, the codec 

Each voice-over-data application maintains a listing of all compressed .WAV data packet is also compressed again, but 
known compression/decompression formats that the appli- 50 since the audio data has been compressed once already by 
cation is capable of using. If a user desires to replace any the codec, this second compression does not significantly 
existing audio codec, then the listing of compression formats reduce the size of the codec compressed .WAV data, 
in the voice-over-data application is updated. The listing of However, compressing the codec compressed .WAV packet 
known compression/decompression formats in the voice- again is not necessary on most communications connections 
over-data application provides a flexible scheme which 55 and can be skipped to further improve latency. The corn- 
makes it relatively easy to substitute in a new audio codec pressed packet is now a socket compressed packet 
at any time, without changing the voice-over-data applica- Compressing the packet header significantly improves the 
tion. bandwidth for the audio connection. When there is actual 

Once the voice-over-data application 104 has negotiated audio data, compression makes the packet header smaller in 

the proper compression scheme, it contacts a local audio 60 size, allowing faster transmission in the limited bandwidth 

codec 106 to use the negotiated compression format to available. When there is no audio data (e.g., momentary 

compress audio input data. silence during a voice connection), the datagram packet sent 

Once the socket connection is established and the com- (i.e., the packet header with a few data bytes indicating 
pression format has been negotiated, the voice-over-data silence) is significantly smaller (e.g., 6-8 bytes instead of 

application takes the uncompressed .WAV data packets sent 65 60), which dramatically increases transmission rates, 

to it by the audio information driver (i.e., the WAVE driver) The socket compressed datagram packet is then passed to 

102 and sends them to the audio codec 106 for compression. a modem driver 114. The modem driver 114 adds any 
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accessary modem specific protocol to the packet The packet 
is then passed to a local modem 116 for transmission 118 to 
a remote modem 120. In addition, the local modem 116 can 
also compress the socket compressed packet before trans- 
mission. 

When the data arrives on the remote' modem 120, it 
follows a path which is the reverse of the path just described. 
Any modem specific protocol is stripped off by a remote 
modem driver 122, and the compressed packet is passed to 
a remote socket driver 124. The remote socket driver 124 
passes the compressed packet to the remote packet decom- 
pressor 126, and the remote packet decompressor 126 
decompresses the packet header. The decompressed packet 
header and compressed .WAV data is passed to a remote 
socket driver 126. The packet returned to the socket contains 
the codec-compressed .WAV audio information along with a 
decompressed wave-over-sockets header. The codec com- 
pressed audio information is passed to a remote voice-over- 
data application 130. The remote voice-over-data applica- 
tion 130 then strips the decompressed wave-over-socket 
header. The remote voice-over-data application 130 sends 
the codec-compressed audio packet to a remote audio codec 
132 along with the appropriate decompression format to use 
for decompression. The remote audio codec 132 decom- 
presses the compressed audio packet using the appropriate 
decompression format. The remote audio codec 132 passes 
the decompressed audio packet to the audio information 
driver 134 (e.g., a WAVE driver) a remote which passes the 
audio information to a remote audio output device 136 for 
output 

The socket communications setup and compression nego- 
tiation will now be explained in more detail. As is shown in 
the flowchart in FIGS. 6A-B, after a local voice-over-data 
application has been started, and there is activity at a local 
audio device 138 (FIG. 6A), an audio information driver 
(e.g.. the WAVE driver) will be stimulated and contact the 
local voice-over-data application 140. The voice-over-data 
application will send a special packet of information called 
a STAKTWAVE packet, to a local socket 142. The special 
audio control packets sent by the WAVE driver have the 
following format: 



typedef struct { 

DWORD dwMessage; 

uchar uuidProtocol [ J; 

WAVEFORMAIEX w&; 
} WAVESOCKETCONTROL; 



where dwMessage is of the type STARTWAVE, 
ACCEFTWAVE, or BADFORM AT, uuidProtocol is the 
socket UUID and protocol (e.g., datagram or raw), and wfx 
is the compression f ormat requested by the audio codec. The 
WAVEFORMAIEX data structure is shown below. 
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where wFarraaflag defines the type of WAVE file; nChan- 
nels is the number of channels in the wave, 1 for mono, 2 for 
stereo; nSamplesPerSec is the frequency of the sample rate 
of the wave file; and nAvgBytesPerSec is the average data 

5 rate. The nBlockAlign field is the block alignment (in bytes) 
of the data; wBitsPerSample is the number of bits per sample 
per channel data; and cbSize the size in bytes of the extra 
information in the WAVE format header not including the 
size of the WAVBFORMATEX structure. 

0 The Yoice-over-data application sends the audio control 
message to an audio control socket defined by Windows® 
95. Windows® 95 creates a socket for use by audio appli- 
cations and makes a Universal Unique Id (UUID) (i.e., 
socket id) known for audio control. 
If Windows® 95 had not already created the audio control 
13 socket, a socket create() (i.e., a datagram or raw socket) and 
bind() would have to be done using the UUID (socket id) for 
use by audio related applications. A socket driver associated 
with the socket passes the audio control message to a modem 
driver 144. 

20 Any necessary modem specific protocol is then added by 
a modem driver 146, and the data is then passed to a local 
modem which transmits the data to a remote modem 148. 

In one embodiment of the present invention, on the 
remote computer side, an audio server application is "lis- 

25 tening" to the audio socket UUID for any input messages, 
The audio server application is started automatically at boot 
time when the Windows® 95 operating system boots up. 
However, the audio server application could also be started 
after the operating system has booted up on the remote 

30 computer. When the STAKTWAVE message is received, a 
remote voice-over-data application would be launched by 
the audio server application. 

Returning to FIG. 6A, when the remote modem receives 
the data sent to it by the local modem, the reverse operations 

35 of those just described take place. The remote modem passes 
the bits received to a remote modem driver 150 which strips 
off any modem related protocol 152. The remote modem 
driver passes the packet to a remote socket 154 (FIG. 6B), 
and a socket driver passes the audio control message to a 

40 remote voice-over-data application 156. 

The remote voice-over-data application examines the 
STARTWAVE message it received from the local voice- 
over-data application. As is shown in the flowchart in FIGS. 
7A-C, if the remote voice-over-data application understands 

45 the compression format requested by the local voice-over- 
data application 158 (FIG. 7 A), the remote voice-over-data 
application replies to the local voice-over-data application 
with an audio control packet with the dwMessage field set to 
ACCEFTWAVE 160. The ACCEFTWAVE packet notifies 

50 the local voice-over-data application that the given com- 
pression format (specified in the wfx field) and UUID have 
been accepted by the remote application for audio traffic. 

If the requested compression format is unavailable for use 
by the remote codec, then remote application responds with 



typedef struct wavefonaat_extended_.tag { 

WORD wForraalTftg; /* format type */ 

WORD nChannels; /* number of channels (i.e. mono, stereo. . .) */ 

DWORD nSamplesPerSec; /* sample rate */ 

DWORD nAvgBytesPerSec; /* for buffer estimation V 

WORD nBJockAliga; f* block size of data */ 

WORD wBitsPerSample; /* # of bits per sample of mono data */ 

WORD cbSize; /The count in bytes of the extra size */ 

} WAVEFORMAIEX; 
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a control packet having the dwMessage set to compression format 184. The socket communications setup 

BADFORMAT, and the wrx field set to the compression and compression negotiation is now complete. If two-way 

format desired by the remote voice-over-application 162. communication is desired, the remote voice-over-data appli- 

When the local voice-over-data application receives a cation must initiate its own audio channel, (with the same or 

return message from the remote voice-over-data application, 5 a different audio compression format), using the audio 

the message type is checked 164 (FIG. 7B). If the return control messages and the steps just described, 

message type isBADFORMAT, then a check is done to see ^° sarr^lm^comr^ssion schemes may also _ be 

if the locafappUcation can use the compression, format use H d over the soctet audio channel o allow Ju* juahty 

" „ . . T^trT™^™ . im .i audio in one direction, and low quality in the other. For 

described in the BADFORMAT return message 166. If the ^^^^d^ 

local application can use requested egression format, the 10 an record store and wisnes to samplc ncw C Ds, 

local application will send an ACCEPTWAVE packet to the ^ coanection from the record store to the user would be a 

remote application 168. The ACCEPTWAVE packet has the ^ quality audio cnanBcl to aU 0W high fidelity audio (i.e., 

dwMessage set to ACCEPTWAVE, and the wfx set the compressed with a higher bit-rate codec) to be transmitted, 

compression format returned from the remote voice-over- A lower quality (i.e., a lower bit-rate codec) channel might 

data application. 15 be set up between the user and the electronic record store to 

If the local application doesn't support the requested permit the user to speak with the record store agent to 

compression format, it will try to send another compression purchase a CD. However, the local and remote codecs must 

format it knows about to the remote voice-over-data appli- be capable of understanding both the low and high quality 

cation. If all known compression formats have not been tried compression formats for this scenario to work. 

170, the local voice-over-data application gets the next 20 Once a socket connection has been established between 

compression format it knows about 172 and sends a START- the local and remote voice-over-data applications, codec 

WAVE message with this compression format 174 and waits compressed audio data can then be sent using the negotiated 

for a response from the remote voice-over-data application format as is shown in the flowchart in FIGS. 8A-8C. Upon 

164. The local voice-over-data application will send addi- further activity at the local audio input device, the audio 

UonalSTAI^AVEaudiocontrolmessages with a different 25 signal data in digital format is passed by a an audio infor- 

compression format (170-174) until it khausts all of its mz *? n ^«f&* ^ tovcr > t0 a 

formats, or finds a format acceptable to the remote voice- ap E^^i; rV? ^ll h* a ™i lVa ti™ ™.„ M tho a„di« 

. ' ^. 1fc sryrr- £ta\ « *„u «„ f^ n A The local voice-over-data application passes the audio 

over-dataappUcanon 158 (FIG. 7A). If a mateb is notfound, ^ to ft ^ ^ ^ ^ m 

(ue.. the local and remote voice-over-data did not negotiate ^ local codec Compressed .WAV data is passed to an 

a compression format), then an error message is printed for 30 ^ SQcket w (which fa a datagram or raw sockct) 

the user 178 (FIG. 7B) and socket setup and compression u$ing ^ returned from ^ negotiations ^ remote 

negotiation ends. voice-over-data appUcation. 

In another embodiment of the present invention, all avail- ^ locd votee ^ ver . data application adds a simple 

able compression formats are assigned a numerical value. m caUed ft wave . over . sockets headcr , to cach 

One STAKTWAVE audio control message is sent to the 35 t of c0(Jec compressed WAV audio data received. The 

remote appUcation with all the numerical values represent- wave _ 0 ver-sockets header has the following format: 
ing known compression schemes to allow a compression 

scheme to be chosen. The remote application sends back one . 

ACCEPTWAVE message with a prioritized list of numerical typedef struct { 

values of compression schemes it will accept If the local 40 dword dwLcngth; 

application can use any of the codecs in the prioritized list vwavepacke? * l: 

of codecs, another STAKTWAVE audio control message is * ' 

sent with the compression scheme which the local applica- 
tion will use. This embodiment further reduces latency by dwLength is the number of bytes in the packet, including the 
reducing the number of compression negotiation messages 45 header, and data is the actual code c com pressed .WAV data 
sen t. formatted using the WAVEFORMATEX structure for .WAV 
In yet another embodiment of the present invention, a list audio data described above. The local voice-over-data appli- 
of default compression schemes are maintained by the local cation passes the codec compressed .WAV data with the 
and remote application. The local application sends a single wave-over-sockets header to the audio control socket 194. 
STAKTWAVE message to the remote application requesting so The audio control socket passes the codec compressed .WAV 
one of the default compression schemes; The remote appli- data with the wave-over-socket header to a socket driver 
cation sends a single ACCEPTWAVE message back to the 196. 

local application including the default compression scheme The modem socket driver passes the codec compressed 

which should be used. This embodiment further reduces .WAV data with the wave-OYer-socket header to a packet 

latency by reducing the number of compression negotiation 55 compressor . 196 which compress the packet header. The 

messages sent. packet compressor then passes the compressed packet 

Once the local appUcation receives the ACCEPTWAVE (which hereinafter will be called a socket compressed packet 

audio packet from the remote application 176, it performs a to avoid confusion with the codec compress packet) to a 

socket connectO to the UUID returned with the ACCEPT- local modem driver 198 which makes the packet ready for 

WAVE from the remote machine 180 (FIG. 7C). After 60 modem transmission by adding a modem specific protocol 

sending an ACCEPTWAVE message 160 (FIG. 7A), the 200. (However, adding a modem specific protocol may not 
remote application performs a socket accept() using the be necessary in every instance.) The local modem driver 
UUID specified in the STAKTWAVE packet from the local then passes the socket compressed packet to a local modem 

application 182 (FIG. 7C) 202 which transmits the packet as bits to the remote modem 

There is now a one-way connection between the local and 65 204 (FIG. 8B). 

remote applications which can be used to send audio data On the remote side, a remote modem receives the bits 206, 
from the local to the remote application using the negotiated and a remote modem driver strips any specific protocol 
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information 208. The remote modem device driver then 
passes the socket compressed packet to a remote packet 
decompressor 210. The remote packet decompressor decom- 
presses the wave-over-sockets packet header. After 
decompression, the remote packet decompressor passes the 5 
packet to a remote socket driver 212. The packet now 
consists of an uncompressed wave-over-socket header and 
the codec compressed .WAV audio information (i.e., the 
codec compressed .WAV information). The remote socket 
driver passes the packet to a remote voice-over-data appli- 10 
cation 214. The remote voice-over-data application strips the 
wave-over-sockets header 216 and passes the codec com- 
pressed .WAV audio information to a remote audio codec 
218. 

The remote voice-over-data applications passes the codec 15 
compressed .WAV audio information to the remote audio 
codec with instructions on what decompression format to 
use (FIG. 8C). The audio codec decompresses the com- 
pressed audio information using the negotiated decompres- 
sion format 220 and passes the information to a remote audio 20 
output device 222. The remote audio output device will 
convert the digital information supplied, by the codec back 
into an analog audio signal with an analog-to-digital con- 
verter (ADC). The analog audio signal is then used to output 
the audio information on an audio output device (e.g., a 25 
speaker, telephone, etc.). The steps shown in FIGS. 8A-8C 
continue as long as there is activity at the local audio input 
device (e.g., microphone or telephone). 

At any time, either side may terminate the audio connec- 
tion by calling the socket utility close(). Th& socket connec- 30 
tion will then be dropped. Both sides then can then listen to 
the audio control UUDD to see if another connection is 
desired at some later time. 

Using the new scheme for voice-over-data just described, 
the average latency of 250 millisecond True Speech™ 35 
compressed audio packet has been reduced to an average 
latency of less than 100 milliseconds (plus the compression/ 
decompression time). In contrast, using the voice/data 
schemes described in the Background section, a 250 milli- 
second True Speech™ compressed audio packet transmitted 40 
over a modem capable of 28,800 bits-per-second 
transmission, would have an average latency in the 200-300 
millisecond range. Thus, the new voice-over-data scheme 
provides a 300%-4-00% improvement in latency (which is 
dependent on the speed of the host computers) and permits 45 
voice, with normal speech patterns, to be sent over modem 
data connection using a standard telephone line. 

It should be understood that the programs, processes, or 
methods described herein are not related or limited to any 
particular type of computer apparatus, unless indicated oth- 50 
erwise. Various types of general purpose or specialized 
computer apparatus may be used which perform operations 
in accordance with the teachings described herein. 

Having illustrated and described the principles of the 
present invention in a preferred embodiment, it should be 55 
apparent to those skilled in the art that the embodiment can 
be modified in arrangement and detail without departing 
from such principles. For example, elements of the preferred 
embodiment shown in software may be implement in hard- 
ware and vice versa. Hardware and software components 60 
can be interchanged with other components providing the 
same functionality. 

In view of the wide variety of embodiments to which the 
principles of our invention can be applied, it should be 
understood that the illustrated embodiments are exemplary 65 
only, and should not be taken as limiting the scope of our 
invention. 
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We therefore claim as our invention all that comes within 
the scope and spirit of the following claims: 
Wc claim: 

1. In a computer system having a first computer in 
communications with a remote second computer over a data 
link, a method of creating a virtual two channel audio 
connection over the data link, the two channel connection 
including a higher quality audio channel, and a lower quality 
audio channel, the method comprising the following steps: 

main tainin g on the first computer a first list of available 
audio data compression and decompression processes, 
the first list including higher quality, and lower quality 
audio data compression and decompression processes; 

maintaining on the remote second computer a second list 
of available audio data compression and decompres- 
sion processes, the second list including higher quality, 

. and lower quality audio data compression and decom- 
pression processes; 

negotiating between the remote second computer and the 
first computer a high quality audio compression and 
decompression process to be used for the higher quality 
audio channel and a low quality audio compression and 
decompression process to be used for the lower quality 
audio channel; 

sending higher quality audio data from the first computer 
to the remote second computer over the data link using 
the negotiated high quality audio compression process, 
the data link thereby serving as a high quality, virtual 
audio channel; and 

sending lower quality audio data from the remote second 
computer to the first computer over the data link using 
the negotiated low quality audio compression process, 
the data link thereby also serving as a low quality 
virtual audio channel 

2. A computer readable medium having stored therein 
instructions for causing the first computer to perform the 
method of claim 1. 

3. The method of claim 1 wherein the lower quality audio 
channel is used to send voice data. 

4. The method of claim 1 where the first list of available 
audio data compression and decompression processes on the 
first computer is identical to the second list of available 
audio data compression and decompression processes on the 
remote second computer. 

5. The method of claim 1 where the first list of available 
audio data compression and decompression processes on the 
first computer is not identical to the second list of available 
audio data compression and decompression processes on the 
remote second computer. 

6. The method of claim 1 in which said data link serves 
concurrently to provide both said higher quality and lower 
quality virtual audio channels. 

7. The method of claim 1 which includes employing a 
socket to establish the data link. 

8. The method of claim 7 wherein the step of establishing 
an audio connection using a socket includes using a data- 
gram socket. 

9. The method of claim 1 wherein the high quality audio 
data is sampled at a rate greater than 11 KHz. 

10. The method of claim 1 wherein the low quality audio 
data is sampled at a rate less than or equal to 11 KHz. 

11. The method of claim 1 further comprising: 
maintaining on the remote second computer a second list 

of available audio data compression and decompres- 
sion processes, the second list including at least one but 
less than all of the high quality, and low quality audio 



08/30/2004, EAST Version: 1.4.1 



5,742, 

15 

data compression and decompression processes con- 
tained in the first list 
12. In a system having a first computer in communications 
with a second remote computer over a data link, a method 
of establishing a virtual plural channel audio connection 5 
therebetween, where each of said audio channels is capable 
of sending two or more streams of compressed audio data, 
the method comprising the following steps: 
negotiating between the first computer and the remote 
second computer a number of audio channels to 10 
establish, wherein said number of audio channels is two 
or more; 

negotiating between the first computer and the remote 
second computer an audio data compression and 
decompression process for each of said number of 
audio channels; 

transmitting audio data from the first computer to the 
remote second computer in two or more compressed 
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audio data streams over each of said virtual audio 
channels using the audio compression process negoti- 
ated therefor. 

13. The method of claim 12 which includes employing a 
socket to establish the data link. 

14. The method of claim 13 wherein said socket is a 
datagram socket. 

15. The method of claim 12 wherein the virtual audio 
channels are used to provide audio location cues for tele- 
conferencing. 

16. The method of claim 12 wherein the virtual audio 
channels are used to provide full spatial imaging cues for 
teleconferencing. 

17. A computer readable medium having stored therein 
instructions for causing a first computer to perform the 
method of claim 12. 

* * * * * 
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